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Abstract 

Simulated  annealing  is  a  popular  Monte  Carlo  algorithm  for  combinatorial 
optimization.  The  annealing  algorithm  simulates  a  nonstationary  finite  state 
Markov  chain  whose  state  space  Q  is  the  domain  of  the  cost  function  to  be 
minimized.  We  analyze  this  chain  focusing  on  those  issues  most  important  for 
optimization.  In  all  of  our  results  we  consider  an  arbitrary  partition 
{I.J}  of  Q;  important  special  cases  are  when  I  is  the  set  of  minimum  cost 
states  or  a  set  of  all  states  with  sufficiently  small  cost.  We  give  a  lower 
bound  on  the  probability  that  the  chain  visits  I  at  some  time  <  k.  for  k 

-  1,2 .  This  bound  may  be  useful  even  when  the  algorithm  does  not 

converge.  We  give  conditions  under  which  the  chain  converges  to  I  in 
probability  and  obtain  an  estimate  of  the  rate  of  convergence  as  well.  We 
also  give  conditions  under  which  the  chain  visits  I  infinitely  often, 
visits  I  almost  always,  or  does  not  converge  to  I,  with  probability  1. 
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1 .  Introduction 

Simulated  annealing,  as  proposed  by  Kirkpatrick  [1],  Is  a  popular 
Monte-Carlo  algorithm  for  combinatorial  optimization.  Simulated  annealing  is 
a  variation  on  an  algorithm  introduced  by  Metropolis  [2]  for  approximate 
computation  of  mean  values  of  various  statistical -mechanical  quantities  for  a 
physical  system  in  equilibrium  at  a  given  temperature.  In  simulated 
annealing  the  temperature  of  the  system  is  slowly  decreased  to  zero;  if  the 
temperature  is  decreased  slowly  enough  the  system  should  end  up  among  the 
minimum  energy  states  or  at  least  among  states  of  sufficiently  low  energy. 
Hence  the  annealing  algorithm  can  be  viewed  as  minimizing  a  cost  function 
(energy)  over  a  finite  set  (the  system's  states).  Simulated  annealing  has 
been  applied  to  several  combinatorial  optimization  problems  including  the 
traveling  salesman  problem  [2],  oomputer  design  problems  [2], [3],  and  image 
reconstruction  problems  [4]  with  apparently  good  results. 

The  annealing  algorithm  consists  of  simulating  a  nonstationary 
finite-state  Markov  chain  which  we  shall  call  the  annealing  chain.  We  now 
describe  the  precise  relationship  between  this  chain  and  the  finite 
optimization  problem  to  be  solved.  Here  and  in  the  sequel  we  shall  take  IR 
to  be  the  real  numbers,  IN  the  natural  numbers,  and  INQ  -  IN  U  { 0 } .  and  we 
shall  denote  by  |A|  the  cardinality  of  a  finite  set  A.  Let  Q  be  a 

finite  set,  say  0  -  {1 . } Q  |  > ,  and  ^  e  IR  for  i  €  0;  we  want  to 

minimize  a±  over  i  €  fl.  Let  Tk  >  0  for  k  €  INq.  fi  shall  be  the 
state-space  for  the  annealing  chain  and  we  shall  refer  to  as  the 
energy  function  and  { ^  as  the  annealing  schedule  of  temperatures . 


Let  ffCk)  -  tff[k)llen 


(a  row  vector)  be  a  Boltzman  distribution  over  the  — 


energies  at  temperature  T^,  l.e. 


-VTk 


,-YT* 


16  0, 


for  all  k  e  !Nq.  The  annealing  chain  will  be  constructed  such  that  at  each 


Avuii  j  j  /  or 


time  k  the  chain  has  11''“’'  as  its  unique  invariant  distribution,  i.e.  ,  at 


each  time  k  the  annealing  chain  shall  have  a  1-step  transition  matrix 
p(k,k+l)  _  j6q  such  that  U  -  is  the  unique  solution  of 


the  vector  equation  H  -  H P 


(k.k+i) 


The  motivation  for  this  is  as  follows. 


Let  S  be  the  minimum  energy  states  in  Q.  Now  if  T^  -»  0  as  k  -*  ®  then 
(  -K-  if  i  €  S*. 


V  o  if  i  t  s  . 

as  k  -►  °>,  i.e.,  the  invariant  distributions  converge  to  a  uniform 

distribution  over  the  minimum  energy  states.  The  hope  is  then  that  the  chain 
itself  converges  to  the  minimum  energy  states. 

We  now  show  how  Metropolis  constructs  a  transition  matrix  p(k,k+l) 

f  K  4  a  n+-  ^  7,00  v 

/-  i\i  r  r\  ^  r  _  1  ^ _ 


with  invariant  vector  H 


for  k  €  INn.  Let  Q  -  [q.  .] 


iJJi.j60 


be  a 


symmetric  and  irreducible  stochastic  matrix,  and  let 


(k,k+l) 

PiJ 


-<W/Tk 


,(k,k+l) 


if  Uj  >  ult 
if  Uj  <  j  *  i, 
if  J  -  i. 


for  all  i,J  e  Q  and  k  e  INn.  Then  it  is  easily  verified  that  7r 


.(k)p(k,k+l) 


for  all  k  e  (NQ.  In  fact,  p(k,k+l)  and  jCk)  satiSfy  the 


reversibility  condition 

(k,k+l),(k)  _  w(k)  (k,k+l) 
pJi  "j  "i  piJ 

for  all  k  6  !Nq.  Let  l^ken^  1 
transition  matrices  {p(k,k+l) , 


i.J  6  0. 

be  the  annealing  chain  with  1-step 


and  some  initial  distribution. 


constructed  on  a  suitable  probability  space  (M,/\,P).  Let  p^ 


for  1  e  ft  and  k  €  INq. 


The  annealing  chain  is  simulated  as  follows.  Suppose  xk  -  i  6  0 .  Then 
generate  a  random  variable  yen  with  P(y  -  j }  -  q  for  J  e  Q. 


Suppose  y  •  J  6  fl.  Then  set 


*k+l  “ 


if  Uj  <  Ui# 


if  Uj  >  with  probability  e 


-(0rDi)/T» 


Hence  we  may  think  of  the  annealing  algorithm  as  a  "probabilistic  descent” 
algorithm  where  the  Q  matrix  represents  some  prior  distribution  of 
"directions",  transitions  to  same  or  lower  energy  states  are  always  allowed. 


and  transitions  to  higher  energy  states  are  allowed  with  positive  probability 
which  tends  to  0  as  k  ->  ®  (when  -»  0  as  k  -»  ®). 

Even  though  simulated  annealing  was  proposed  as  heuristic,  its  apparent 


success  in  dealing  with  hard  combinatorial  optimization  problems  makes  it 
desirable  to  understand  in  a  rigorous  fashion  why  it  works.  The  recent  works 
of  Geman  [4],  Gldas  [S],  and  Ultra  et.  al.  [6]  have  approached  this  problem 
by  showing  the  exlstance  of  an  annealing  schedule  for  which  the  annealing 
chain  converges  weakly  to  the  same  limit  as  the  sequence  of  invariant 
distributions  (1^ ^INg’  l  e‘  •  t0  a  unJ-*oro  distribution  over  S*.  In  each 

case  a  (different)  constant  c  is  given  such  that  if  Tfc  >  c  /  log  k  for 


large  enough  k  6  IN  and  Tk  -»  0  as  k 
(  If  i  6  S*. 


if  in 


<»  then 


as  k  -*  ®.  Furthermore,  under  an  annealing  schedule  of  the  form  T^  -  T  / 
log(k+kQ)  where  T  >  o  and  kg  >  1,  Ultra  et.  al.  obtain  an  upper  bound  on 
^  for  k  e  INg.  The  results  of  Geman,  Gidas,  and  Mitra  et.  al. 

len 

are  an  extension  of  weak  convergence  results  for  stationary  aperiodic 
irreducible  chains  [7]  and  certain  nonstationary  chains  [8),  and  are  useful 
in  proving  ergodic  theorems  (which  Gidas  does).  However,  if  one  is  simply 
interested  in  finding  any  minimum  energy  state  than  weak  convergence  seems 
unnecessarily  strong.  In  a  recent  paper  Hajek  [9]  investigates  when  the 


annealing  chain  converges  in  probability  to  S  .  Hajek  gives  an  expression 
* 

for  a  constant  d  such  that  under  the  annealing  schedule  -  T  / .  log  k 

*  * 
for  large  enough  k  €  IN,  Ptx^eS}-*!  as  k-*«  iff  T  >  d  . 

Furthermore  the  condition  that  Q  be  symmetric  is  relaxed  to  what  is  called 

“weak  reversibility" . 

In  this  paper,  we  analyze  simulated  annealing  focusing  on  optimization 

issues.  Here  we  are  not  so  much  interested  in  the  statistics  of  individual 

* 

states  as  in  that  of  certain  groups  of  states,  such  as  the  set  S  of 

minimum  energy  states  or  more  generally  a  set  S  of  all  states  with 

sufficiently  low  energy.  In  all  of  our  results  we  consider  an  arbitrary 

partition  {I.J}  of  Q.  and  examine  the  behavior  of  the  annealing  chain 
relative  to  this  partition;  we  obtain  results  for  I  -  S  as  a  special  case, 
we  investigate  both  finite-time  and  asymptotic  behavior  as  it  depends  on  the 
Q  matrix  and  the  annealing  schedule  of  temperatures  tTk*keiN  • 

In  Section  2  we  establish  notation.  In  Section  3  we  examine  finite-time 

behavior.  We  observe  that  since  we  may  keep  track  of  the  minimum  energy 

state  visited  up  to  time  k,  it  seems  more  appropriate  to  lower  bound  the 

probability  of  visiting  S  at  some  time  n  <  k,  rather  than  the  probability 

of  visiting  S  at  time  k.  Under  an  annealing  schedule  of  the  form 

Tjj.  -  T  /  log(k+kQ)  where  T  >  0  and  kQ  >  1,  we  obtain  a  lower  bound  on 

P{xn  e  I,  some  n  <  k>  for  k  €  !NQ.  For  large  T  this  bound  converges  to  1 

exponentially  fast.  For  small  T  the  bound  converges  to  a  positive  value  > 

0.  Hence  the  bound  is  potentially  useful  even  for  small  T  when  the 

algorithm  may  not  converge.  In  Section  4  we  examine  asymptotic  behavior. 

First,  we  show  that  under  suitable  conditions  on  Q  there  exists  a  constant 
*  * 

U  such  that  if  >  U  /  log  k  for  large  enough  k  e  IN,  then  the 

probability  that  x^  €  I  infinitely  often  is  1.  Second,  we  show  that  under 

* 

suitable  conditions  on  Q  if  T  >  U  and  -  T  /  log  k  for  large  enough 
k  g  IN,  then  x^  converges  in  probability  to  I.  Infact,  we  show  that 
PUk  €  1}  •  1  -  0(k  )  as  k  -»  m ,  where  r  >  0  does  not  depend  on  T  and 


only  depends  on  Q  through  the  set  {(i,J)  6  Q  *  Q:  q^j  >0}  of  ordered 

pairs  of  allowed  transitions.  Third,  we  show  that  under  suitable  conditions 

* 

on  Q  there  exists  a  constant  U.  such  that  if  U  <  T  <  0,  and  -  T  / 

log  k  for  large  enough  k  6  IN.  then  the  probability  that  e  I  almost 

always  is  1.  Hence  we  obtain  three  results  about  the  convergence  of  the 

annealing  algorithm  with  increasingly  stronger  assumptions  and  conclusions. 

In  Section  4  we  also  obtain  a  converse  which  gives  conditions  under  which  the 

annealing  algorithm  does  not  converge:  we  show  that  under  suitable  conditions 

* 

on  Q  that  there  exists  a  constant  w  such  that  if  e  >  0  and  < 

* 

(w  -e  )  /  log  k  for  large  enough  k  e  IN,  then  the  probability  that  x^  e  I 
infinitely  often  is  <1.  Finally,  we  briefly  compare  our  results  to  Hajek's 
work  and  indicate  some  directions  for  further  research.  We  remark  that 
Sections  3  and  4  are  essentially  independent  of  each  other. 


-  6  - 


mu  www  wi  Jm  jnnwnLniuiii 


2.  Notation  and.  Preliminaries 

In  this  section  we  describe  notation  which  Is  necessary  to  state  our 
results,  give  a  few  examples  of  this  notation,  and  discuss  a  technical 
condition  which  we  shall  often  Impose  In  the  sequel. 

Let  V  -  min  U,  and  U  -  max  U,  .  Then  S*  -  {16  fl:  U,  -  U}  and  S  - 
i€Q  16Q  1 

{16  0=  ^  <  U}  for  some  U  <  U  <  U.  Following  standard  notation,  we  shall 

define  p(k,k+d)  _  jpCk,k+d)j^  to  be  the  d-step  transition  matrix 

starting  at  time  k,  l.e. , 

p(k,k+d)  _  p(k,k+l)  p(k+d-l,k+d) 

In  defining  the  annealing  chain  la  Section  1  we  assumed  that 

the  stochastic  matrix  Q  was  symmetric  and  Irreducible.  This  assumption  Is 

unnecessarily  strong  for  our  purposes.  If  {I.J}  Is  a  partition  of  0  and 

we  want  x^  to  oonverge  to  I  as  k  -*  ®.  then  we  need  only  require  some 

kind  of  condition  which  guarantees  transitions  can  be  made  from  J  to  I, 

and  possibly  another  condition  which  makes  transitions  from  J  to  I  more 

likely  than  transitions  from  I  to  J,  depending  on  the  mode  jf  convergence. 

We  will  be  more  precise  later  in  Section  4;  for  now  assume  Q  Is  an 

arbitrary  stochastic  matrix.  For  each  i,J  €  Q  we  shall  say  that  1  can 

reach  J  If  there  exists  a  sequence  of  states  1  -  Iq.^ . -  J  such 

that  q.  .  >  0  for  all  n  -  0 . k-1;  If  0  6  IS  and  U.  <  U  for  all 

1n1n+l  xn 

n  -  0 . k  than  we  shall  say  that  1  can  reach  J  at  energy  U. 

Let  k  €  (Nq,  and  for  every  d  6  !N  and  l.J  e  0  let  /l^  be  the 

sequences  of  states  1  -  ln,...,i.  -  J  such  that  p$kJk+1^  >  0  for  all  n  - 

0  d  Vn+l 

0 . d-1.  are  the  sequences  of  allowed  transitions  of  length  d  from 

1  to  J  at  positive  temperature  (we  defined  Tfc  >  0  for  all  k  6  INQ).  For 
every  d  6  IN  and  i.J  e  Q  let  &e  t^e  sequences  of  states  i  - 

iQ . id  -  J  such  that  q±  ,  >0  for  all  n  -  0 . d-1.  We  might 

think  of  as  the  sequences  of  allowed  transitions  of  length  d  from  1 

to  J  at  infinite  temperature.  Note  that  c  411(1  tlie  elements  of 
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.(d) 


A'y  \  M;^  are  precisely  those  secjuences  vhich  have  a  self  transition,  say 


from  s  -*  s,  with 


^ss  "  ® 


and  q 


st 


0  for  some  ten  such  that 


's. 


Now  for  d  e  IN.  l.J  €  fi.  and  X  6  A 


(d) 


d-1 

OCX)  -  ^  max[0,  0 


1J 


let 


n-0 


1  -°1  ]* 
^■n+l  n 


V(X )  -  max 

n-0, - d-1 


maxCO,  U.  -U, 
1n+l  x0 


WCX)  - 


max 


maxCO,  U 


n-0 . d-1 


n+1 


Also  let 


min  O(X) 
Cd) 


if 


.Cd)  *  t 

''ij  *  f  ’ 


Xe/I 


rCd) 


1J 


'IJ 


if 


AW  -  * 
*ij  f  ’ 


for  all  d  €  IN,  and 


,Cd) 


0,  .  -  inf  oJd)  -  min  0) .  , 
1J  delN  J  d<|fi| 

and 


,(d) 


Similarly  define  ▼£)  ,V±j 


C2.1) 

by  replacing  0 


for  all  i.J  e  fi. 

_w  -«  _v  -V 

by  V  and  W,  respectively,  in  the  definitions  of  Uij  ,U1J  above. 
Finally,  if  one  or  both  of  the  indices  l.J  €  0  are  replaced  by  I,J  c  0  in 
these  definitions  then  an  additional  minimization  is  to  be  performed  over  the 


elements  of 


Cd) 


I.J.  e.g. .  ofd)  -  min  0, , 
1J  jeJ 


w. 


ij 


min 


if  we  replace  A 


Cd) 


IJ 


by  H 


Cd) 


161,  JeJ 


wij.  etc- 


U 


in  the  definitions  of 


,(d)  „Cd) 


Note  that 

Cd) 


'U 


IJ 


and 


ij 


then  the  values  of  these  quantities  will  in  general  be  changed;  however  the 


values  of  O^  ,  V^j  , 


and 


(U^-*)  as  the  transition  energy  Cd-step  transition  energy)  from  x 


’U 


will  be  unchanged.  We  shall  refer  to 


U_ 


*y 

to  y. 


,0 


for  x,y  G  0  U  2 

Example  2.1  In  Figure  2.1  we  show  a  state  transition  diagram  for  Q  - 

(1 . 3}  where  transitions  are  governed  by  the  Q  matrix,  i.e.,  an  edge 

from  i  g  0  to  J  e  Q  is  shown  iff 

labelled  with  the  value  of  q^ j .  To  obtain  the  state  transition  diagram  for 


q^j  >  0,  in  which  case  the  edge  is 


the  corresponding  pCk.k+1)  matrix,  k  e  INQ,  simply  add  a  self-transition 


loop  to  every  state  which  can  make  a  transition  to  a  higher  energy  state  Cif 


-  8  - 


The 


one  is  not  already  present)  and  relabel  the  edges  appropriately. 

(k.k+l) 


self -transitions  which  are  allowed  under  Pv'“'kTi''  but  not  under  Q  are- 
depicted  by  broken  loops.  Also  observe  that  the  ordinate  axis  gives  the 
energy  of  the  corresponding  state.  To  illustrate  the  notation  we  have 
A]Z'  -  {(1.1, 2. 3. 4. 5). (1,2. 3. 3. 4, 5), (1,2, 3. 4, 5. 5.)} 


,(»> 

15 

t(5) 

*15 


U 


15 


-  {(1.1. 2, 3, 4. 5)} 
.  n(5) 


'15 


VU1  +  V°3  “  *■ 


V15  -  VW>  ■  V°1  - 3- 


'l5  -  »15>  -  V°1  -  2- 


Let  { I ,  «T }  be  a  partition  of  Q .  In  Section  4  we  will  often  impose  the 
following  condition:  there  exists  d  e  IN  such  that  the  d-step  transition 
energy  from  J  to  I  equals  the  transition  energy  from  j  to  X,  for  all 

for  all  j  6  J).  This  will  allow  us  to  get  lower  bounds 
on  the  quantity  P{x£k+1^d  e  1  1  ^d  ”  ^  for  a11  J  6  J-  is  easy  to  show 
that  if  I  -  S  then  this  condition  is  satisfied.  Infact,  in  this  case  there 
exists  do  -  lJl  suclx  that  for  every  d  >  dQ, 

Example  2.2  In  Figure  2.2  we  show  a  state  transition  diagram  for  0 


J 6  J  ‘"S’  -  dji 


-  U  for  all  j  6  J. 


{1. . 

.  . . 7 }  ( see  Example 

2.1). 

Let 

I  -  S  -  {1 

e  Q:  U±  <  2}  -  {1,2,3}.  J  - 

{3,  . 

.  .  .  7 } .  Then 

U^d)  -  u  -  g^d)  . 

ol  U3I  U6I 

°6I  " 

u(d) 

U7I 

“  U7I  “ 

d  >  1. 

°5I>  -  ”51  -  »• 

d  > 

2. 

'S’  ■  a4I  *  *• 

d  > 

3, 

by  in  the  definition 

and 

so  d0  -  3.  Note  that  if 

we  replace 

of 

for  l.j  6  Q. 

then 

there 

does  not  exist  d  e  IN  such  that  " 

IT 


for  all  J  e  J. 


3.  Finite-time  Behavior 

From  the  point  of  view  of  applications  it  is  important  to  understand  the 
finite-time  behavior  of  the  annealing  algorithm.  Certainly  it  is  interesting 
to  know  whether  the  annealing  algorithm  converges  according  to  various 
criteria,  and  this  information  may  well  give  insight  into  finite-time 
behavior.  However  this  information  may  also  be  misleading  for  the  following 
reasons.  First,  the  finite-time  behavior  of  the  annealing  algorithm  may  be 
quite  satisfactory  even  when  the  algorithm  does  not  converge,  which  may  well 
be  the  case  for  typical  applications.  Second,  the  finite-time  behavior  of 
the  annealing  algorithm  may  not  be  clearly  related  to  the  convergence  rate 
when  the  algorithm  does  converge,  as  the  following  example  Indicates. 

Exam-pie  3 . 1  It  is  a  simple  consequence  of  Proposition  4.1(11)  that  if 
Q  is  symmetric  and  irreduolble.  T  >  0,  and  >  T  /  log  k  for  large 

enough  k  £  IN.  then  there  exists  a, a  >  0  such  that 

P{x.  6  S*>  <  1  -  —  ,  It  large  enough. 

x  ka 


Now  let  P  be  the  matrix  obtained  from  by  setting  Q  -  [1/|Q|] 


and  -  0,  and  let  { yfc ) ^  ,  yfc  6  fl ,  be  a  stationary  Markov  chain  with 


1-step  transition  matrix  P  and  some  initial  distribution,  constructed  on 

t 

(M.^.P).  Since  S  is  Just  the  set  of  persistent  states  for  this  chain,  it 
1  ;ell-known  that  there  exists  b  >  0  and  0  <  p  <  1  such  that 

P{yk  6  S*}  >  1  -  bpk,  k  6  in0. 

* 

Hence  assuming  that  T  is  chosen  such  that  P{x^  €  S  }  -*  1  as  k  -♦  ®  then 

the  rate  that  P{x^  6  S*}  -*  1  is  at  best  polynomial  while  the  rate  that 

C 

Pfy^  e  S  }  -*  1  is  at  worst  exponential.  Of  course  we  would  hope  that  the 
finite-time  behavior  of  the  annealing  chain  would  be  better  than  the 
stationary  chain,  for  appropriate  choice  of  Q  and  T.  4 

We  now  address  the  question  of  what  is  an  appropriate  criterion  to 

assess  the  finite-time  behavior  of  the  annealing  algorithm.  For  our 


purposes,  we  are  simply  interested  in  finding  any  state  of  sufficiently  low 
energy,  i.e.,  an  element  of  S.  Hence  it  seems  reasonable  to  lower  bound 


p{xk  e  S}  for  k  e  INq.  However,  we  observe  that  by  Just  doubling  the 
annealing  algorithm's  memory  requirements  we  can  keep  track  of  one  of  the 
minimum  energy  states  visited  by  the  chain  up  to  the  current  time.  In  this 
case  we  are  really  interested  in  having  visited  S  at  some  time  n  <  k,  as 


opposed  to  actually  occupying 


at  time 


Hence  it  seems  more 


appropriate  to  lower  bound  P{rQ  e  S.  some  n  <  k}  for  k  6  INC 


We  start  with  a  proposition  which  gives  a  lower  bound  on  the  d-step 

(  V  -frd  ^ 

transition  probability  p^ ’  in  terms  of  the  transition  energies  Uf\) 


of  sequences  X  e  ,  for  i,j  e  Q. 

Proposition  3.1  Let  d  €  IN.  T>0,  kQ  >  1.  and  -  T  /  log(k+kQ) 
for  k  e  INq.  Then  for  every  i,J  6  (1 


,(k,k+d) 


>  )  r(X)(k+kn+d-l)  U(X)/T, 


k  6  INq  , 


(3.  1) 


where  r(X)  >  0  is  given  in  (3.2). 


Proof  Let 


(k.k+1) 

pil 


if  J  t  i. 


if  J  -  i. 


for  all  i,J  5  Q  and  k  e  INn.  Also  for  every  i,J  6  fi  and  X  -  (in . 1 . ) 


4  (X)  -  maxtO,  U.  -U.  ] 
xn+l  n 

rkCX)  -  TT1  r^n)  >  0. 
n-0  1n1n+l 


n  -  0 . d-1 , 


k  €  INq, 


r (X )  -  TT  rrT  »  (3.2) 

n-0  in1n+l 

Since  Tk  is  strictly  decreasing,  p££,k+1^  and  hence  rij^  are 
nondecreasing,  so  that  rk(X)  >  r(X)  for  all  k  €  INq.  Hence  for  every  i,j 


_(k,k+d) 

PiJ 


1 — p  p(k+n , k+n+1 ) 
.  % .  (d)  n-0  ^n^n+l 


rr  r[k;n)  exp 

(i  i  )e/d)  n“°  n  n+1 

'•10 . 1d/e^ij 

r  r  dQ(X) 

L,  r*a)  H  -  I 


Lk+d-i 


max[Q,  tJ. 


“Ui  ] 
n+1  n 


\eA 


(d) 

ij 


n-0 


lk+n 


Xe/i 


)  rt(X)  exp 
L(a' |  K 


Xe/1 


Cd) 

ij 

Cd) 

ij 


log(k+k0+d-l) 


d-1 


r(X)(k+kn+d-l) 


- T - 

-UCX)/T 


I  vx> 


n-0 


k  6  (Ng . 


Remarks  on  Proposition  5.1  (1)  In  Figure  2.1  we  have 

r((l,2,3,4,5))  -  <5 3^34^4 5  ”  IS’ 
r((2,3,3,4,5))  -  ^sPm*  ^34^45  “  5 

rcca.3,4.8.8))  -  qas^Ws?’15  -  5 


1  - 


1 1 

kT7T  +  ^7T 
0  0 


1  - 


(2) 


Fix  k  e  !Nn.  From  (3.2)  it  is  easy  to  see  that  r(X) 


is 


nondecreasing  as  T  decreases  or  kQ  increases,  which  reflects  the  fact 
that  self-transitions  in  the  sequence  X  have  larger  probability  at  lower 
temperature.  On  the  other  hand,  (k+k0+d-l)_U^/T  i  0  as  T  i  0  or  kQ  f 
®  (if  U(X)  >  0),  which  reflects  the  fact  that  transitions  to  higher  energy 
states  in  the  sequence  X  have  smaller  probability  at  lower  temperature. 
These  two  phenomena  compete  with  each  other  in  the  lower  bound  (3.1). 

The  next  theorem  gives  a  lower  bound  on  P{xn  e  S,  some  n  <  k}  for  k  e 


]0 

by  setting 

i  ■ 

-  s. 

Theorem  3 

Let 

max 

uCd) 

JI  ’ 

T  > 

0, 

k0  ’ 

p{Xnd  €  J, 

n  - 

0,  . 

.  -  ,k} 

/ 

exp 

a 

{ 

axr 

-a  )  ni 

.  1 

f  1 

*o 

,a/d 

0,  kn  >  1,  and  T.  -  T  /  log(k+kn)  for  k  6  INr 


Then 


,1-a 


exp 


d"C  1  -a  ;  (kd  +  V1  a 


kd  +  n. 


exp 


a  1  _ 

[a  11 

d'Ca  -1)  IcFT  exp 

ld(a_1)  (kd  +  a-)0-1' 

if  T  >  U. 

if  T  -  U, 

if  T  <  0,  (3.3) 


12  - 


for  all  k  €  INq,  where  a  -  U/T,  nQ  -  k0+d-l,  and  a  >  0  is  given  in 
(3.5). 

Note  In  the  statement  of  Theorem  3.1  and  in  the  proof  to  follow  we 
suppress  the  dependence  of  the  constants  U  and  a  on  d.  Later,  we  shall 
make  this  dependence  explicit  by  writing  U^d^  and  a^d^. 


Proof  From  Proposition  3.1  for  every  i,J  e  (1 
p<5^d>  >  l  rO,)Cfek0.d-l>-«*>/T. 


k  e  IN0. 


where  r(X)  >  0  is  given  in  (3.2).  Hence 

k-1 

P{xnd  e  J.  n  -  0 . k}  <  IT  max  P{*(n+1)d  6  J  I  xnd  “J  > 

u*U  jcd 

-  fr  [l  -  min  5  P$?d’Cn+1)d) 

n-0  <•  JeJ 


,(nd, (n+l)d) 


k-i  f 

<  n  i  - 

n-0  L 


(nd  +  nQ)a 


k  6  INq,  (3.4) 


where 


a  -  min  S  \  r(X)  >  0.  (3.5) 

J I  X€«d>. 
o(x)<u 

(if  u  -  ®  let  a  be  any  positive  real).  Since  1+x  <  ex  for  all  x  e  (R . 
we  have 


[l  a 

<  ATT) 

* 

f_  a  y  1 

ATT) 

_  a 

[k  1  dll 

(nd  +  nn)a 

n-n  +  nn)a 

>  DA 

a 

0  (xd  +  nn)a 

exp[arT=a7  no"a]  exp[-  <10=57  Ckd  +  *0^ 


if  a  t  1, 


if  a  -  1.  (3.6) 


for  all  k  g  INq.  Combining  (3.4)  and  (3.6)  completes  the  proof. 

Remarks  on  Theorem  3 . 1  (1)  Let  I  -  S*  -  {5},  J  -  {1,2, 3, 4},  and  d 

-4  in  Figure  2.1.  Then  U  -  u!^  "  4  and 


min  \  r(X ) . 

[1.2, 3, 4}  ^(4) 


U(X)<4 

Now  it  is  not  hard  to  see  that  the  minimum  is  obtained  by  J  -  1  or  2.  Using 


the  values  of  r(\)  computed  in  the  first  remark  following  Proposition  3.1 
we  have 

a  "  T5  min  l'  4  "  7477  '  7577  "  7177  • 

^0  *0  *Q 

(2)  Note  that 

P{xnd  e  J,  n  6  INq }  -  lim  P{xQd  €  J.  n  -  0 . k} 

-  0  if  T  >  TJ, 

S  «p[-  irr-T7  jft]  •  1  if  T  .  0. 

n0 

so  that  the  bound  is  potentially  useful  even  when  T  <  U. 

(3)  Fix  k  6  INq.  It  will  be  convenient  to  analyze  the  dependence  of 
the  upper  bound  (3.3)  on  T  and  kQ  in  the  form 

rk  i 

P{x„  .  e  J,  n  -  0 . k}  <  exp  -  a  -  dx  (3.7) 

nd  Jo  (xd  ♦  nQ)a 

(see  (3.6)).  Since  r(\)  is  nondecreasing  as  T  decreases  or  kQ 

increases,  we  have  from  (3.5)  that  a  is  nondecreasing  as  T  decreases  or 

k0  increases,  which  reflects  the  fact  that  self-transitions  in  sequences  of 

transitions  from  J  to  I  have  larger  probability  at  lower  temperature.  On 

r*  i 

the  other  hand,  -  dx  i  0  as  T  1  0  or  k_  t  ”  (If  U  >  0), 

Jo  (xd  +  nQ)a  0 

which  reflects  the  fact  that  transitions  to  higher  energy  states  in  sequences 

of  transitions  from  J  to  I  have  smaller  probability  at  lower  temperature. 

Since  these  two  phenomena  compete  with  each  other  one  could  consider 

minimizing  the  r.h.s  of  (3.7)  over  T  and  k^  to  obtain  the  best  bound. 

(4)  We  can  generalize  Theorem  3.1  by  replacing  U  -  max  with  U' 

J6J 

>  U  (if  U'  <  u  then  a  -  0  and  the  upper  bound  (3.3)  is  useless).  Since 

a  and  a  are  both  nondecreasing  with  Increasing  U‘  one  oould  consider 
minimizing  the  r.h.s.  of  (3.7)  over  U'  as  well  as  T  and  kg  to  obtain 
the  best  bound  (see  previous  remark). 

In  order  to  apply  Theorem  3.1  we  must  obtain  suitable  estimates  for  the 

constants  and  a^d^ .  We  are  currently  investigating  this  in  the 


context  of  a  particular  problem. 


4.  Asymptotic  Analysis 

la  the  previous  section  we  pointed  out  some  of  the  difficulties 
associated  with  using  the  asymptotic  behavior  of  the  annealing  algorithm  to 
predict  its  finite-time  behavior.  Nonetheless,  it  is  certainly  interesting 
from  a  theoretical  viewpoint  to  perform  an  asymptotic  analysis,  i.e,  to  find 
conditions  under  which  the  annealing  algorithm  does  or  does  not  converge 
according  to  various  criteria,  and  when  the  algorithm  converges  to  estimate 
the  rate  of  convergence  as  well.  In  this  section  we  address  these  questions, 
and  then  briefly  compare  our  results  to  Hajek's  work  and  Indicate  some 
directions  for  further  research. 

We  first  address  the  question  of  what  are  appropriate  criteria  to  assess 
the  asymptotic  performance  of  the  annealing  algorithm.  For  our  purposes,  we 
are  simply  Interested  in  finding  any  state  of  sufficiently  low  energy,  i.e., 
an  element  of  S.  Hence  we  shall  Investigate  conditions  on  the  Q  matrix 
and  the  annealing  schedule  of  temperatures  under  which  one  or  more 

of  the  following  is  true: 

(i)  e  S  l.o. }  -  1. 

(ii)  Pfx^  €  S)  -*  1  as  k  -•  ®, 

(iii)  Ptx^.  €  S  a. a.  }  -  1. 

Here  "l.o."  and  “a. a."  are  abbreviations  for  "infinitely  often"  and  "almost 


always ",  i.e., 


{x.  e  S  i.o.}  -  llm  {x,.  e  S}  -  fl  U  (x.  e  s} 

*  k-»  *  n-1  k>n  * 


{x^  e  S  a. a.}  -  llm  {x^  e  S)  -  U  fl  {x.  e  S} 

k-**  n-1  k>n 

Since  (c.f .  [7]) 

Ptx^  e  S  a. a.}  <  llm  P{xk  e  S)  <  Hi  Pfx^  €  S)  <  P{xk  e  S  i.o.},  (4.1) 

k—»  k-<® 

it  follows  that  (l),(ii),  and  (ill)  are  increasingly  strong  results  and  so  we 
expect  increasingly  strong  conditions  under  which  each  is  true.  We  are  also 
Interested  in  obtaining  the  rate  of  convergence  in  (ii)  as  well  as  conditions 
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under  which  (i).(ii),  and  (ill)  do  not  hold. 

We  start  by  giving  a  proposition  which  establishes  asymptotic  upper  and 
lower  bounds  on  the  d-step  transition  probability  p£j  **+<^  as  k  -»  ®  in 


terms  of  the  transition  energy  O^j ,  for  i.J  €  Q. 

Proposition  4.1  Let  d  e  IN  and  T  >  0.  Then  there  exists  a^  >  0 
for  i,J  e  0  such  that  each  of  the  following  is  true: 

(1)  if  Tfc  <  T  /  log  X  for  large  enough  k  e  IN  then 

™  ‘Dlj/Tpi“*4)  S  *14 

for  all  i.J  £  0, 

(ii)  if  Tk  >  T  /  log  k  for  large  enough  k  e  IN  and  -*  0  as  k  -» 


g!  *0l/Ip£r*4)  *  *14 

for  all  i.J  e  n  such  that  . 

(ili)  if  Tk  -  T  /  log  k  for  large  enough  k  e  IN  then 
_(k,k+d)  aiJ  ..  w  . 

pij  -  Trrr/T  •  as  k  -*  -  ■ 

k 

for  all  i.J  €  Q  such  that  ^ j • 

Proof  We  prove  (i);  the  proof  of  (ii)  is  similar  and  (ill)  follows 
from  (i)  and  (ii).  So  assume  T,_  <  T  /  log  k  for  large  enough  k  e  IN  and 


from  (i)  and 

(ii).  So  assume 

Tk  *  ' 

let 

j 

if 

J  *  i. 

roo  - 

rU 

(  (k.k+1) 

V  P±1 

if 

J  -  i. 

for  all  i.J 

e  Q  and  k  € 

IN. 

Also. 

6  let 

4„<X)  - 

maxCO,  a. 

1n+l 

*1  3 
n 

* 

rk(X)  - 

TT1  . 

0. 

n-0  1n1n+l 

n  -  0 . d-1, 

k  e  IN. 


r(X)  -  lim  rk(X)  -  sup  rk(X)  >  0. 


That  the  limit  exists  in  the  definition  of  r(\)  and  is  equal  to  the 

supremum  is  a  consequence  of  lim  p*?’**1^  -  sup  (since  Tv  -»  0  as 

k-*»  11  he  IN  * 

it  -*  «• ) .  Hence  for  every  i ,  J  e  Q 

(k.k+d)  ^  (k+n,k+n+l) 

1J  (1  “l  W"  «  ”1’"1 

-  y  TT  riktn)  expf  —  nax[o,  u.  -a.  ] 

,<  LA  ^  ,,(<!)  n-0  n1n+l  Ak+d-l  xn+l  xn 

(i0 . V^iJ 

r  d;X  d_(X) 

-  }  rk(X)  exp  -  )  - - 

.  *  L-  Tk+n 


where 


-  L,,  rkCX)  ex*[  -  }  VX) 

X€/0,;  n-0 

^  rk(X) 

"  x  jcd)  J^7T 

X€/1iJ 

$  ^  *  Ug).  ^ 

na)“uij}  u(x)^°ij' 

ai  1 

-  ~Ot  -  as  k  -  -• 

If  ^  J 


(k  large  enough) 


rk(X) 

ZTOTJ7T 


X€/U“y. 

«X>?og> 


iJ  X^f. 

UCX)"Uij') 


r(X)  >  0 


(if  TJ±y  -  co  let  aij  be  any  positive  real). 

The  following  theorem  gives  conditions  under  which  P{xk  6  S  i.o.}  -  1 
by  setting  I  -  S. 

Theorem  4. I  Let  {I,J}  be  a  partition  of  0  and  assume 

(a)  there  exists  d  e  IN  such  that  the  d-step  transition  energy  from  j 
to  I  equals  the  transition  energy  from  J  to  I,  for  all  j  e  J 
(U<d)  -  Ujj  for  all  J  6  J) , 

(b)  every  J  €  J  can  reach  some  i  e  I  (max  U.T 


«  <»  ) . 


Also  let  u 


max  0 
J6J 


JI 


oo ,  t  >  o  /  log  k  for  large  enough  It  €  IN .  and 


0  as  k  -*  ®.  Then  Ptx^  6  1  i.o.)  -  1. 

proof  From  Proposition  4.1(li)  there  exists  a  >  0  such  that 


(k.k+d)  „  a 

4  4  -  “T  • 


PiJ 


k  large  enough. 


for  all  i,j  e  Q  such  that  -  U±j .  Hence  for  every  large  enough  k  e  IS 

P{xQd  e  J,  n  >  k}  <  fj  max  P{x(n+1)d  s  J  |  xQd  -J) 

n-k  j€«j 


n-k 

OO 

<  TT 

n-k 


1  -  min  l  P$f'Cn+1 
J€J  J1 


)d) 


1  -  min 
J6J 


i€l. 


(nd) 


Uji/U 


<  TT  [l  -  »in  } 

n-k  L  J€J  ^ 


iei. 

-u 

uji  °JI 


(nd) 


Uji/U 


<  TT 

n-k 


1  - 


n2 


by  (a).  Since  the  infinite  product  diverges  (to  zero),  P{xnd  e  J,  n  >  k)  - 
0  for  all  k  e  IN,  and  the  theorem  follows. 

* 

Remarks  on  Theorem  4 . 1  (1)  In  Figure  2.1  let  I  -  S  -  {5},  J  - 

{1,2, 3, 4}.  Then  U*  -  Ulg  -  4. 

(2)  Condition  (a)  was  discussed  in  Section  2  and  is  satisfied  for  I  - 

S. 

Our  next  theorem  gives  conditions  under  which  P{xJt6S}-»l  as  k  -*  ® 
by  setting  I  -  S,  and  obtains  an  estimate  of  the  rate  of  convergence  as 
well.  We  shall  need  the  following  lemma,  the  proof  of  which  can  be  found  in 
the  Appendix. 

Lemma  Let  a>0,  0  <  a  <  1.  P>  »  a ,  kg,m0  e  IN,  and  a/kg  <  1.  Then 

,1-a 


(i) 


k 

n 

e-kr 


i  -  — 


-  o(e 


-bk* 


). 


as  k  -> 


where  b  -  a/(l-a)  >  0, 


-  18  - 


(ii)  for  every  n  €  !NQ 


k 


m-k, 


(k+l-m) 

~1T~ 


n  k 

-  rr 

C -m+mQ 


1  -  2- 


-  0(k_T), 


as 


where  r  -  /5-a  >  0. 

Theorem  4.2  Let  {I.J}  be  a  partition  of  Q  and  assume 

(a)  there  exists  d  e  IN  such  that  the  d-step  transition  energy  from  J 
to  I  equals  the  transition  energy  from  J  to  I,  for  all  J  £  J 
(ujj5  -  UJX  for  all  j  e  J). 

(b)  every  j  e  J  can  reach  some  i  e  I  (max  U,T  <  ®), 

j£J  JX 

(c)  the  transition  energy  from  I  to  J  Is  greater  than  the 

transition  energy  from  j  to  I.  for  all  J  £  J  (min[U_ ,-U  _]  >  0). 

jeJ  XJ  JX 

Also  let  U*  -  max  U.T  <  ®.  T  >  V  ,  and  T.  -  T  /  log  k  for  large  enough  k 
J€J  J 

e  fN.  Then  P(xk  €  1}  -♦  1  as  k  ® .  Furthermore,  if  we  assume 

(d)  there  exists  i  6  I  which  can  reach  some  J  £  J  (U-j  <  ®), 


then 


P{xk  €  1}  -  1  -  0(k 


-r/T 


), 


as  k  -* 


where  r  -  min[U_.-U.T]  (0  <  r  <  ®  by  (c)  and  (d)) 


JeJ 


'IJ  “JI 


Proof  From  Proposition  4.1  there  exists  a^  >  0  such  that 


(k.k+d) 


PU 


-  “TJ7T7T 


k  e  IN, 


(4.2) 


U 


for  all  i.J  e  Q.  Also  from  Proposition  4.1  there  exists  &2  >  0  such  that 


(k.k+d) 


PU 


2  7^ 

■(d) 


k  large  enough. 


(4.3) 


-  U^j .  In  the  sequel  (4.2)  ((4.3))  will  be 


for  all  i.J  £  fi  such  that  U) 
used  to  upper  (lower)  bound  the  probability  of  transitions  from  I  to  J  (J 
to  I). 


Let  J. . J  be  a  partition  of  J  such  that  U.j  -  0-  _  for  all  J 

x  r0  *'x  "rx 

€  Jr,  and  Uj  x  <  Uj  j  for  all  r  <  s.  For  example,  in  Figure  2.1  let  I  - 
r  °  s 

-  (5),  J  -  {1,2, 3, 4}.  so  that  -  {4},  Jg  -  {2,3},  and  J3  - 


S 


{1}. 


Also  let  a 


ana 


"  u  “  UJ  I/T- 

r 


U  /T.  £  -  U  o 

IJr  r  s-1  s 


Kr 


IK  I.  for  r  -  1 . rn.  Note  that  a  -  a  <  1  and  K  -  J.  Finally  let 

'I*  1  U  X  X  fy. 


p( J.m.n.r)  -  P{xkd  e  Kr>  k  -  m+1 


■n  I  zmd  "  J}’ 


and 


<j(i,  j  ,m,n,r)  -  P{xVrt  e  K  .  k  =  m+1 


for  i.J  e  Q.  m,n  6 
write 


kd  “  “r ' 
and  r  -  1 , 


,  n-1  ;  x 


nd 


J  I  xmd  "  1 *  * 


Then  for  every  kn  €  IN  we  can 


(4.4) 


where 


s(k) 


r  (kgd) 

2  Pj  p(J,k0,k,r0) 
jeJ 


(4.5) 


and 


k-1 


}  l  Pl"d>  }  'Cmtl)4)«.CJ.»*1.lt.r0). 

J6J 


m-kQ  i€ I 


(k) 


for  all  k  -  k0,kQ+l,....  In  words,  P^  is  the  probability  that  xnd  e  J 


,(k) 


for  all  n  -  kn . k.  and  J  is  the  probability  that  xmd  €  I  for  some 


m  -  kQ . k-1  and  xnd  6  J  for  all  n  -  m+1 . k.  We  can  further  write 


p(k)  „  p(k)  p(k) 
*2  *3  *4  ’ 


(4.6) 


where 


k-1  0 

P3k)  ~  1  1  Pimd)  1  1  p^jd’(m+1)d)p(J ,m+l,k,r)  (4.7) 

r-1  J6 J 


m-kg  iel 


and 


,(k)  _ 


k-2 


I  I 


(md) 

Pi 


k  r0 


m-kg  iel 


for  all  k  -  kg.kg+1,... 


^  a (1, J ,m,n.r-l)  p(J,n,k,r),  (4.8) 

n-m+2  r-2  jeJr 

In  words,  P^1^  (P^^ )  is  the  probability  that 


when  zad  makes  the  transition  from  I  to  J  at  time  m  it  visits  at  time 


m+1  (at  some  time  >  m+2)  the  state  in  J  with  the  largest  transition 
energy  back  to  I  amongst  the  states  in  J  that  are  visited  from  time  n  - 


The  motivation  for  the  decompostion  in  (4.6)  is  as  follows.  Suppose  we 
work  directly  with  (4.4).  Observe  that  the  P^  term  only  keeps  track  of 
how  the  chain  makes  transitions  from  I  to  J  but  not  how  it  stays  in  J. 
In  this  case  we  are  forced  to  work  with  the  "worst  case"  scenario  where  the 


chain  makes  minimum  energy  d-step  transitions  from  I  to  J  (with  energy 

UTT)  and  maximum  energy  d-step  transitions  from  J  to  I  (with  energy  max 
IJ  j€  J 

Ckl 

U  ).  In  order  to  show  that  '  -  0  as  k  -»  ®  it  seems  clear  that  we 

(  } 

would  have  to  require  UTT  -  max  U.T  >  0.  On  the  other  hand,  in  the  Pi 

U  J6J  jl  3 

and  P^k^  terms  of  (4.6)  we  not  only  keep  track  of  how  the  chain  makes 

transitions  from  I  to  J  but  also  how  it  stays  in  J.  In  order  to  show  that 

-*  0  (and  consequently  Pgk^  -.0)  as  k  -»  <»  it  is  not  hard  to  see 

that  we  need  only  require  mln[UT,-U,T]  >  0,  which  is  guaranteed  by  (c).  We 

je  J  J  J 

now  proceed  with  the  details. 

(k) 

We  start  by  upper  bounding  P^  .  Using  (4.3),  for  every  large  enough 
kQ  e  IN  we  have 


p(Jo,ko,lc,ro^  -  JT  jgj  P{x(?+Dd  6  J 

-  rt  [i  -  .in  > 

e-k„  l  JEJ  J1 


k-1 

<  1  f  1  -  min 


k-1 

<  TT 

e-kr 


k-1 

<  rr  i  - 


161,  (2d) 

Ji  UJ1 


161.  (Cd) 

-u 

Ji  JI 


(ed)a 


Jq  €  J,  k  “  kg.kg+1, 


r  ”  1 . r0’ 

by  (a).  Combining  (4.5)  and  (4.9)  gives  for  every  large  enough  kQ  e  IN 

ft.  \  k-1  r  ao  l 


(4.9) 


<  rr  i  - 


k  “  ko,ko+1. 


(4.10) 


-  21  - 


r¥7i 


Since  a2  >  0  and  a  <  1  we  can  apply  Lemma  (i)  to  (4.10)  for  every  large 
enough  kn  €  IN  to  get 


PC*>  -  0(e-b(kd)  "*). 


as  k 


(4.11) 


where  b  -  a2/(l-a)  »  0. 


(t)  (h) 

We  continue  by  upper  bounding  P£  '  and  .  First,  by  almost  the 


same  reasoning  that  led  to  (4.9),  for  every  large  enough  n  e  IN  we  have 

1  r  a 

p ( J , n,k, r)  <  1  f  1  - ,  J  e  J,  k  -  n.n+1 . 

(ed)^1 


r  “  1 . ro- 


(4.12) 


Next ,  suppose  that 


for  k  -  (m+l)d . (n-l)d. 


for  some  i,j  6  (1,  in  6  IN,  n  -  m+2,m+3 .  and  r  -  1 . rQ.  Then  clearly 

there  exists  k  e  IN  (1  <  k  <  min[n-m-l  ,krl ) ,  Intermediate  times  m  <  ^ 


<...<  lk_1  <  n-1,  and  distinct  intermediate  states  . e  Kp  such  that 

zmd  "  if  z(m+l)d  ” 


"  Je. 


z(ie+l)d  "  ^8  +  1. 


for  8  -  1 . k-1. 


x(n~l)d  "  Jk-  2nd  "  J ' 
Let  A(i,  J  ,m,n,r;k,i1,  .  .  .  ,i]c_1. 


(4.13) 


,  J.)  be  the  event  defined  by  (4.13). 


Then  we  have  shown  that 


J  (i.  J  ,m 


,n,r)  <  ) 

1  Li 
X1 . Ak-1  ’ 


P{A(i,  J  ,m,n,r;k,i1 . ^-l”5! . Jk)} 


Ji . Jv  k 


k  -i  k-1 

<  k^  2  (n-m-2)  r 


max  P{A(i,  j,m,n,r;k,i1 . i^.Jj . Jfc)}. 

X1 . 1k-l 


Ji . Jv.k 


i.J  6  n,  n  -  m+2,m+3 .  m  e  IN. 


apply  Lemma  (ii)  to  each  term  in  (4.15)  for  every  large  enough  kQe  IN  to  get 

P^k)  +  P4k)  "  \  0(k  ^  ^  =  0(:k_T/T^  as  k  -  ®.  (4.16) 

r-1 

where  the  last  equality  follows  from 

r  -  min[UT<-0.T]  -  min  [UTT  -  UT  _]  -  min  (/3_-a_)T. 

J6J  IJ  JI  r-1 . rQ  IJr  JrX  r-1 . rQ  r  r 

Finally,  combining  (4 . 4) , (4 . 6) , (4 . 11 ) ,  and  (4.16)  gives 

p{xkd  e  J}  -  o(e_b<'kd')  2)  +  OOTt/t),  as  k  -*  ».  (4.17) 

Similarly  we  can  show  that  in  (4.17)  P{xkd  6  can  be  replaced  by 

?{xkd+k  €  J},  for  all  kQ  -  0....,d-l.  Hence 

P{xk  e  J}  -  0(e~bk  +  0(k“T/T),  as  k  -»  a>,  (4.18) 

and  the  Theorem  follows  since  b,y  >  0  (and  r  <  ®>  if  (d)  is  true). 

* 

Remarks  on  Theorem  4.2  (1)  In  Figure  2.1  let  I  -  S  -  {5}, 

J  -  {1.2, 3, 4}.  Then  U*  -  U15  -  4  and  r  -  &51-&15  -  Uj-Ug  -  1. 

(2)  Condition  (a)  was  discussed  in  Section  2  and  is  satisfied  for  I  - 

S. 


(3)  Condition  (c)  is  satisfied  for  I  -  S  and  Q  symmetric  since 


min[UT ,-U,T]  >  min  [U,  ,-U..]  -  min  [U.-U, ] 


JSJ 


'IJ  JI 


lei,  J6J 


ij  Ji' 


161.  J6J 


*  1  w  J  * 

J 


0. 


(4)  When  condition  (d)  is  not  satisfied  (r  -  ®),  (4.18)  shows  that 

,1-a 


P{xk  6  1}  -  1  -  0(e 


-bk 


) ,  as  k  ® , 

where  a  -  TJ  /T  and  b  >  0.  What  we  have  actually  shown  is  that 


P{xk  6  I,  some  n  <  k}  -  1  -  0(e 


-bk 


1-a 


). 


as  k 


and  this  is  valid  when  only  (a),(b),  T  >  tJ  ,  and  Tfc  >  T  /  log  k  for  large 
enough  k  e  IN  are  assumed.  Theorem  4.1  can  be  deduced  from  this  by  taking 


U  .  It  is  possible  to  lower  bound  b  in  terms  of  the  aij  ’ s 


Proposition  4.1,  but  we  shall  not  do  so  here. 

(5)  We  can  get  a  somewhat  better  estimate  of  the  rate  of  convergence  as 


follows.  Let  1  be  the  collection  of  subsets  of  I  such  that  IQ  e  )  iff 


the  partition  {In,Jn}  satisfies  conditions  (a),(b),(c),  and  (d) .  Assume 


that  3  t  i  and.  let 

tCI.)  tCI0) 

— j -  =  max  — j -  , 

u  (i.)  iQei  u  Cl0) 

r*  -  r(I«).  T**  -  T*(I,).  T  >  T*  * ,  and  Tfc  -  T  /  log  It  for  large  enough  k 

6  IN  •  Then 

* 

PUk  e  1}  -  1  -  0(k-T  /T),  as  k  -*  ® . 

The  corollary  to  the  next  theorem  gives  conditions  under  which 

?{xk  e  S  a. a.}  -  1  by  setting  I  -  S. 

Theorem  4.5  Let  {I.J>  be  a  partition  of  0  and  assume  that  the 
transition  energy  from  I  to  J  is  positive  (U^j  *  0)-  Also  let 

0,  6  >  0,  and  Tk  <  (U,-e  )  /  log  k  for  large  enough  k  6  IN.  Then 

?{xk  €  I  a. a.}  -  P{xk  €  I  1 • o ■ l • 

Proof  Let  T  -  0,-e  .  Then  from  Proposition  4.1(1)  there  exists  a  >  0 
such  that 


(k,k+l) 

PiJ 


< 


a 


7T 


k  e  IN, 


for  all  i,j  6  Q.  Hence 


p{xk  e 


I. 


xk+l  € 


J }  < 


max 

iel 


p(ik.i  € 


i) 


and  since  Ut/T  >  1, 


<  max 
iel 


2 


max 
ie  I 


\  (k,k+l) 

2  PU 

J6J 


k  e  IN, 


j  Pixk  €  X-  xk+l  6  J)  ‘  -• 

k-l 


Applying  the  "first"  Borel-Cantelli  Lemma  (c.f.  [7])  we  have 

?{xk  6  I,  xk+1  e  J  i . o . }  -  0,  and  the  theorem  follows. 

Corollary  4 . 1  Let  {I,J}  be  a  partition  of  Q  and  assume  that 

(a)  there  exists  d  €  IN  such  that  the  d-step  transition  energy  from  J 
to  I  equals  the  transition  energy  from  j  to  I,  for  all  J  e  J 
(u^  -  u  for  all  J  e  J), 

(b)  every  j  e  J  can  reach  some  iel  (max  U.T 


<  00  )  , 


(c)  the  transition  energy  from  I  to  J  is  greater  than  the 


transition  energy  from  j  to  I,  for  all  J  e  J  (UTT  -  max  U,T  .>  0). 

jeJ 

*  * 

Also  let  U  -  max  U,T  <  ®,  U,  -  UTT  >  0,  U  <  T  <  U, ,  and  Tv  -  T  /  log  k 
je  J  JX  d  * 

for  large  enough  k  e  IN.  Then  P{xk  e  I  a. a.}  -  1. 

Proof  Combine  Theorems  4.1  and  4.3. 

Remarks  on  Corollary  4.1  (1)  In  Figure  2.1  let  I  -  S*  -  {5},  J  - 

« 

{1.2, 3, 4}.  Then  U  -  0^g  -  4  and  U„  -  Ug4  -  4.  Hence,  unlike  condition 
(c)  of  Theorem  4.2,  condition  (c)  of  Corollary  4.1  is  not  generally 
satisfied,  even  when  I  -  S  and  Q  is  symmetric. 

(2)  Note  that 


JIJ 


IJ 


’IJ 


min  max[0,  U.-U. ]. 
i€l,j6J,  J  X 

qiJ>° 

The  corollary  to  the  next  theorem  gives  conditions  under  which 


P{x,  €  S  i.o.)  <  1  by  setting  I  -  S.  By  (4.1),  these  are  conditions  under 


which  the  algorithm  does  not  converge  according  to  any  of  our  criteria. 
Theorem  4.4  Let  {I,J}  be  a  partition  of  0  and  assume 
(a)  the  transition  energy  from  J  to  I  is  positive  (UJ;r  >  0), 


(b)  every  i  e  I  can  reach  some  j  e  J  (max  UiJ  <  °°). 


iei 


Also  let  e  >  0  and  Tfc  <  (Uj^-j  )  '  log  k  for  large  enough  k  e  IN .  Then 


P{xk  e  I  i.o. }  <  1. 


Proof  From  Proposition  4.1(i)  there  exists  a  >  0  such  that 

k  e  IN, 


^(k,k+l)  „  a 
piJ  5  "1J7T7T 
k 


for  all  i,j  €  Q.  Hence  for  every  large  enough  k  e  IN 


P{x_  e  J,  n  >  k}  >  P{xv  e  J}  1  ]"  min  P{x„_,,  e  J  I  x„  -J ) 

n  k  n-k  JeJ  n+1  n 


iei 


-  P{x.  6  J}  n  [l  -  max  \  p5?,n+1) 
n-k  J6  J  ,4,  J1 

>  P{xk  6  J)  n 

n-k  - 

tr* 

a  I 


1  -  *“  }  -tTTTTT 

iei  n 


>  P{x.  e  J }  1  f 


n-k 


1  - 


inrTTi- 


Since  Uj^/T  »  1  the  infinite  product  converges  (to  a  positive  value),  and 
by  (b)  P{xk  €  J}  >  0  for  infinitely  many  It  e  IN .  Hence  P{xQ  €  J,  n  >  k}  > 
0  for  some  large  enough  k  e  IN.  and  the  theorem  follows. 

Corollary  4.2  Let  {I.J}  be  a  partition  of  0  and  assume  that 

(a)  the  transition  energy  from  some  j  6  J  to  I  is  positive 

(max  U.T  >  0) . 
je  J 

Also  let  W*  -  max  W._  >0.  J*  -  {J  6  J:  W,T  -  W*},  I*  -  0  \  J* ,  and 

J6J 

assume  that 

(b)  the  transition  energy  from  J*  to  I*  is  positive  (Uj*^*  >  0). 
Finally  let  e  >  0  and  <  (W  -e  )  /  log  k  for  large  enough  k  6  IN.  Then 
P(xk  6  I  i.o. }  <  1. 

Proof  Observe  that  W  -  uj*i*  and  apply  Theorem  4.4  to  the  partition 

In  Figure  2.1  let  I  -  S*  -  {5},  J  -  (1,2, 3, 4}.  Then  W*-  Wlg  -  W25  -  W35  - 

2  and  J*  -  {1,2,3}. 

We  next  state  a  theorem  of  Hajek's  which  gives  necessary  and  sufficient 
conditions  for  P{xk  €  S*}  1  as  k  -*  ® . 

Theorem  (Ha.lek)  Assume  that 

(a)  i  can  be  reached  from  j,  for  all  i.J  e  fi  (Q  is  irreducible). 

(b)  if  i  can  be  reached  from  J  at  energy  U  then  J  can  be 
reached  from  i  at  energy  U,  for  all  i,J  e  n  and  U  e  IR  (t^+W^  - 

UJ+WJi ’  for  8,11  i'J  €  n)- 

* 

Let  d  -  max  vjg*  ‘  T  anci  Tk  -  T  /  log  k  for  large  enough  k  e 

JfSS* 

TJ .  Then  P{xJc  e  S*}  1  as  k  -  ®  iff  T  >  d* . 

Proof  See  [9] . 

Remarks  on  Ha.lek* s  Theorem  (1)  in  Figure  2.1  we  have  d*  -  V15  -  3. 

(2)  In  Hajek's  paper  conditions  (a)  and  (b)  are  called  "strong 
lrreducibllity''  and  "weak  reversibility",  respectively.  Condition  (b)  is 
satisfied  for  Q  symmetric. 

(3)  Obviously  W*  <  d*  <  U*  and  the  equalities  hold  only  in  fairly 


trivial  cases.  Hence  under  conditions  (a)  and  (b),  Hajek's  Theorem  is 

* 

stronger  than  our  Theorem  4.2  and  Corollary  4.2  with  I  -  S  .  However,  the 
conditions  under  which  our  results  are  obtained  are  different,  and  in  general 
weaker  than  Hajek's,  with  the  exception  that  condition  (c)  of  Theorem  4.2  can 

be  true  when  condition  (b)  of  Hajek's  Theorem  is  false  and  conversely.  Also 

* 

we  obtain  an  estimate  of  the  rate  for  which  Pfx^e  S  )  -»  1  as  k-»®. 

We  close  this  section  by  indicating  how  we  can  analyze  various 
modifications  of  the  annealing  algorithm  by  our  methods.  Such  modifications 
might  include 

(i)  allowing  the  Q  matrix  to  depend  on  time, 

(ii)  measuring  the  energy  differences  with  random  error, 

(iii)  allowing  the  temperature  T^  to  depend  on  the  current  state  x^. 
The  Important  point  to  observe  in  modifications  such  as  these  is  that  our 
results  depend  only  on  the  Markov  property  of  the  annealing  chain 

and  the  asymptotic  behavior  of  its  d-step  transition  matrix  { p(k,k+d) , 

XEINq 

as  k  »>  for  fixed  d  €  IN.  In  particular,  our  results  are  based  on 


satisfying  one  or  both-  of  the  inequalities 

He  k"1 J/T ,  o 

k-w> 


U,  ,/T 

Ili  k  ij  p^’k+d)  < 


(4.19) 


(4.20) 


for  appropriate  i,J  6  Q.  Hence  our  results  are  valid  for  any  Markov  chain 
which  satisfies  (4.19)  and/or  (4.20)  for  appropriate  i,J  €  fi.  Ofcourse  in 
general  the  Uij's  are  not  given  by  (2.1),  and  can  lnfact  be  any 
non-negative  real  numbers  (or  ®),  with  the  exception  that  In  Theorem  4.2  we 
require  U^j  <  for  cer'tai11  i,J,2  €  0.  We  are  currently  examining 
the  modifications  of  the  annealing  algorithm  mentioned  above  and  are  also 
attempting  to  extend  our  results  to  more  general  (countably  infinite  and 
uncountable)  states  spaces. 


We  have  analyzed  the  simulated  annealing  algorithm  focusing  on  those 
issues  most  important  for  optimization.  Here  we  are  interested  in  finding 
good  hut  not  necessarily  optimal  solutions.  We  distinguished  between  the 
finite  time  and  asymptotic  behavior  of  the  annealing  algorithm.  In  our 
finite-time  analysis  we  gave  a  lower  bound  on  the  probability  that  the 
annealing  chain  visits  a  set  of  low  energy  states  at  some  time  <  k,  for  k  - 
1,2,....  This  bound  may  be  useful  even  when  the  algorithm  does  not  converge 
and  as  such  is  probably  our  most  important  result  for  applications.  We  are 
currently  engaged  in  trying  to  apply  this  bound  to  a  specific  problem.  In 
our  asymptotic  analysis  we  obtained  conditions  under  which  the  annealing 
algorithm  converges  to  a  set  of  low  energy  states  according  to  various 
criteria.  Hajek  has  recently  given  necessary  and  sufficient  conditions  that 
the  annealing  chain  converge  in  probability  to  the  minimum  energy  states.  We 
gave  an  estimate  of  the  rate  of  convergence.  Our  methods  apply  to  various 
modifications  of  the  annealing  algorithm.  We  hope  to  explore  some  of  these 
modifications  and  to  extend  our  results  to  more  general  state  spaces. 


Proof  of  Lemma  Cl)  Without  loss  of  generality  we  assume  kQ  -  1.  Then 
using  the  inequality  1+x  <  ex  for  all  x  €  fR  we  have 

n  l*  -  f?] s  •**[- *  £  eip[-  •  ii  ?  h  -  *v“1 


k  e  IN. 


(A.  1) 


Proof  of  Lemma  Cli)  Without  loss  of  generality  we  assume  kQ  -  mQ  «  1. 
Then  using  (A.l)  and  the  inequality  (x+1)^  <  x^  +  y  for  all  x  >  1  and  0 


<  y  <1  we  have 


e -m+i 1  e 


rj  |l  -  s-l  <  a«-»we'btl'a  <  eV“H.toH, 


k  -  m+1 ,m+2 .  m  €  IN  . 


*  l 


(k+l-m)n  „bm1-a 

‘  K -  Q 


?  -  1 . k,  keiN,  ns  INQ. 


Then  we  can  write 


(k+l-m) 


n  k 


<> -m+1 


1  -  =-  <  e  e 


a  -bk 


fa(k.k),  k  s  IN,  ns  INq. 


We  shall  show  that  for  every  n  s  INq  there  exists  an*bn  6  IR  such  that 


ouww  uuav  111  q  wwji  u 

f.Ck.e)  <  a  .*l‘“  ,  b  V>. 

n  n  a 


8  -  1 . k,  k  6  IN. 


(A. 2) 


and  consequently 
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(k+l-m) 


n  k 


C-m+1 


1  -  =r  -  OCk^). 


as  k  -+  ®, 


as  required. 

Proof  of  (A. 2)  is  by  Induction  on  n  s  INq.  First  consider  n  -  0.  Let 
g(x)  -  ebx/x^,  x  >  1.  Since  g'  (x)  >  0  for  large  enough  x,  it  follows 


i  bm1-a 

(k.C)  -  ^  -  <  |^g(x)dx  +  gC 1 )  +  g(m) 


*  ebx 
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dx  +  eb  +  — 
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1  x° 
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where  5  -  (/3-a)/(l -a)  -  r/(l-a)  »  0.  Let  [6  J  be  the  largest  integer  <  6. 
Then  expanding  ebx  in  a  Taylor  series  and  integrating  tern  by  term  we  have 


fQCk.C)  < 
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where  aQ  -  1  +  (1/a) [ ( [6 ] +l)/( [5 ] +1-5 )]  and  bQ  -  eb. 

Next  assume  (A. 2)  is  valid  for  a  €  INq  and  consider  n+1.  Summing  by 
parts  (c.f.  [10))  we  have 

e-i 

fn+i(k,e )  -  (k+l-e)fn(k,C)  +  ^  V*.®) 

m-l 

(fl-t)**1  eWH  ♦ 
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if  we  set  a_.  -  a„(a  +1)  and  b„^,  -  b  (a  +2).  By  induction  (A. 2)  is 
n+i  n  n  a+i  n  n 


valid  for  all  a  6  IN«. 
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